8 research outputs found
Advanced semantics for accelerated graph processing
Large-scale graph applications are of great national, commercial, and societal importance, with direct use in fields such as counter-intelligence, proteomics, and data mining. Unfortunately, graph-based problems exhibit certain basic characteristics that make them a poor match for conventional computing systems in terms of structure, scale, and semantics. Graph processing kernels emphasize sparse data structures and computations with irregular memory access patterns that destroy the temporal and spatial locality upon which modern processors rely for performance. Furthermore, applications in this area utilize large data sets, and have been shown to be more data intensive than typical floating-point applications, two properties that lead to inefficient utilization of the hierarchical memory system. Current approaches to processing large graph data sets leverage traditional HPC systems and programming models, for shared memory and message-passing computation, and are thus limited in efficiency, scalability, and programmability. The research presented in this thesis investigates the potential of a new model of execution that is hypothesized as a promising alternative for graph-based applications to conventional practices. A new approach to graph processing is developed and presented in this thesis. The application of the experimental ParalleX execution model to graph processing balances continuation-migration style fine-grain concurrency with constraint-based synchronization through embedded futures. A collection of parallel graph application kernels provide experiment control drivers for analysis and evaluation of this innovative strategy. Finally, an experimental software library for scalable graph processing, the ParalleX Graph Library, is defined using the HPX runtime system, providing an implementation of the key concepts and a framework for development of ParalleX-based graph applications
An Extensible Timing Infrastructure for Adaptive Large-scale Applications
Real-time access to accurate and reliable timing information is necessary to
profile scientific applications, and crucial as simulations become increasingly
complex, adaptive, and large-scale. The Cactus Framework provides flexible and
extensible capabilities for timing information through a well designed
infrastructure and timing API. Applications built with Cactus automatically
gain access to built-in timers, such as gettimeofday and getrusage,
system-specific hardware clocks, and high-level interfaces such as PAPI. We
describe the Cactus timer interface, its motivation, and its implementation. We
then demonstrate how this timing information can be used by an example
scientific application to profile itself, and to dynamically adapt itself to a
changing environment at run time
GA4GH: International policies and standards for data sharing across genomic research and healthcare.
The Global Alliance for Genomics and Health (GA4GH) aims to accelerate biomedical advances by enabling the responsible sharing of clinical and genomic data through both harmonized data aggregation and federated approaches. The decreasing cost of genomic sequencing (along with other genome-wide molecular assays) and increasing evidence of its clinical utility will soon drive the generation of sequence data from tens of millions of humans, with increasing levels of diversity. In this perspective, we present the GA4GH strategies for addressing the major challenges of this data revolution. We describe the GA4GH organization, which is fueled by the development efforts of eight Work Streams and informed by the needs of 24 Driver Projects and other key stakeholders. We present the GA4GH suite of secure, interoperable technical standards and policy frameworks and review the current status of standards, their relevance to key domains of research and clinical care, and future plans of GA4GH. Broad international participation in building, adopting, and deploying GA4GH standards and frameworks will catalyze an unprecedented effort in data sharing that will be critical to advancing genomic medicine and ensuring that all populations can access its benefits
Coronal Heating as Determined by the Solar Flare Frequency Distribution Obtained by Aggregating Case Studies
Flare frequency distributions represent a key approach to addressing one of
the largest problems in solar and stellar physics: determining the mechanism
that counter-intuitively heats coronae to temperatures that are orders of
magnitude hotter than the corresponding photospheres. It is widely accepted
that the magnetic field is responsible for the heating, but there are two
competing mechanisms that could explain it: nanoflares or Alfv\'en waves. To
date, neither can be directly observed. Nanoflares are, by definition,
extremely small, but their aggregate energy release could represent a
substantial heating mechanism, presuming they are sufficiently abundant. One
way to test this presumption is via the flare frequency distribution, which
describes how often flares of various energies occur. If the slope of the power
law fitting the flare frequency distribution is above a critical threshold,
as established in prior literature, then there should be a
sufficient abundance of nanoflares to explain coronal heating. We performed
600 case studies of solar flares, made possible by an unprecedented number
of data analysts via three semesters of an undergraduate physics laboratory
course. This allowed us to include two crucial, but nontrivial, analysis
methods: pre-flare baseline subtraction and computation of the flare energy,
which requires determining flare start and stop times. We aggregated the
results of these analyses into a statistical study to determine that . This is below the critical threshold, suggesting that Alfv\'en
waves are an important driver of coronal heating.Comment: 1,002 authors, 14 pages, 4 figures, 3 tables, published by The
Astrophysical Journal on 2023-05-09, volume 948, page 7
GA4GH: International policies and standards for data sharing across genomic research and healthcare
The Global Alliance for Genomics and Health (GA4GH) aims to accelerate biomedical advances by enabling the responsible sharing of clinical and genomic data through both harmonized data aggregation and federated approaches. The decreasing cost of genomic sequencing (along with other genome-wide molecular assays) and increasing evidence of its clinical utility will soon drive the generation of sequence data from tens of millions of humans, with increasing levels of diversity. In this perspective, we present the GA4GH strategies for addressing the major challenges of this data revolution. We describe the GA4GH organization, which is fueled by the development efforts of eight Work Streams and informed by the needs of 24 Driver Projects and other key stakeholders. We present the GA4GH suite of secure, interoperable technical standards and policy frameworks and review the current status of standards, their relevance to key domains of research and clinical care, and future plans of GA4GH. Broad international participation in building, adopting, and deploying GA4GH standards and frameworks will catalyze an unprecedented effort in data sharing that will be critical to advancing genomic medicine and ensuring that all populations can access its benefits